Open Source

1 Introduction

The Surge Capacity Assessment Tool (SCAT) is used to assess a laboratory’s readiness for surge capacity in an emergency situation. the SCAT consists of an Excel document in which the laboratory data is recorded. In case multiple Excel files need to be analysed and compared, it can be a challenging task to extract all the data from the list of SCAT Excel files.
This document describes how multiple Excel files can be the imported at once and subsequently compared and analysed. The code is written in R and some basic knowledge of the language is required to apply this code. The code is presented in chunks, which can be read by clicking the code buttons on the right. As an example, three SCAT Excel files can be used as dummies. Do not forget to remove these dummy Excel-files from the import folder before you extract your own Excel-files.

1.1 Variables

The code for extracting the information from the Excel files modifies the structure of the tables. It does not changes any recorded results. Each SCAT question is seen as a table-variable, with one row for each laboratory (or Excel-file). The SCAT questions would form very long variables and are therefore truncated, i.e. “How many additional manhours would be available to work on the post-analytical phase under emergency operations?” will be truncated to variable name man_hrs_an_add. The table below provides an overview of the modified variables for the SCAT-data collection.

BIOSEC SCAT characteristics
Variable Description Data type Range Unit
file_name name of file Character
date Date in the SCAT form Double
user_guidelines unknown NA
equipment_1 Most important equipment, first Character
equipment_2 Most important equipment, second Character
equipment_3 Most important equipment, third Character
test_max_day Maximum nr of mol. tests per day Double Numeric
operating_hrs Lab operating hrs per day, normal Double (1-24) hours
work_hrs Number of hrs per day that staff works Double (1-24) hours
man_hrs_pre Number of hrs per day pre-analytic phase Double (1-24) hours
man_hrs_an Number of hrs per day analytic phase Number (1-24) hours
man_hrs_post Number of hrs per day post-analytic phase Number (1-24) hours
equipment_[1/2/3]_normal Equipment 1/2/3 running hrs per day, normal operations Double (1-24) hours
occupation_[1/2/3]_normal Equipment 1/2/3 occupation per day, normal operations Double Percentage
operating_hrs_max Lab operating hrs per day, during emergency Double (1-24) hours
work_hrs_max Number of hrs per day that staff works, during emergency Double (1-24) hours
man_hrs_pre_add Number of hrs per day pre-analytic phase, emergency Double (1-24) hours
man_hrs_an_add Number of hrs per day analytic phase, emergency Double (1-24) hours
man_hrs_post_add Number of hrs per day post-analytic phase, emergency Double (1-24) hours
equipment_[1/2/3]_emergency Additional equipment 1, emergency Double
occupation_[1/2/3]_emergency Maximum occupation equipment 1, emergency Double Percentage
max_nr_tests Maximum number of mol. tests, emergency Double
limiting_factor Factor limiting 24 hr surge capacity Character

1.2 Libraries

Several basic R-libraries are used and loaded.

rm(list = ls())  ## remove all data from global environment.
## Set working directory
#setwd("C:/Users/hovetr/Documents/SCAT/SCAT_R") Not needed with Markdown, but perhaps when using R-Project)
## Load libraries 
library(DT)
library(htmlwidgets)
library(tidyverse)
library(janitor)
library(kableExtra)
library(plotly)

1.3 Functions

This code contains a few functions that come of use. The function round_df enables to round-off all ‘numbers’ within a data-frame to a given number, i.e. round_df(penguins, 2).

The function create_dt is based on the DT library and creates an interactive table of a data-frame.
Important Note: applying this function for a Markdown HTML document, will incorporate all data from the set within the HTML file.

#### rounding numbers
round_df <- function(df, digits) {
  nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))

  df[,nums] <- round(df[,nums], digits = digits)

  (df)
}


#### Create Data table. -----
create_dt <- function(x){
  DT::datatable(x, class = 'cell-border stripe', filter = 'top',
                extensions = 'Buttons',
                options = list(dom = 'Blfrtip',
                               buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                               lengthMenu = list(c(10,25,50,-1),
                                                 c(10,25,50,"All"))))
}

2 Import data

From the the folder import, all available Excel files with he extension .xlsm are imported into the data-frame scat_df. The data is retrieved only from the tab 24h Surge Calculator. The data from the different Excel file are merged together in one data frame. The first column with the name “Value” contains the name of the Excel file.

scat_df <- list.files(path = "input",       # Change the name of the path corresponding to the dedicated folder name
                       full.names = TRUE,   
                      recursive = TRUE,
                       pattern = "*.xlsm") %>%  # Extension name is .xlsm (not .xlsx)
  tbl_df() %>% 
  mutate('24h Surge Calculator' = map(value, readxl::excel_sheets)) %>%
  unnest('24h Surge Calculator') %>%  
  rename(sheet = '24h Surge Calculator') %>%     # Column names starting with a number are always causing trouble; rename it to 'sheet' 
  filter(sheet == "24h Surge Calculator") %>%  
mutate(myFiles = purrr::map2(value, sheet, function(x,y) {
    readxl::read_excel(x, sheet = paste(y))})) %>% 
unnest(myFiles) 

3 Clean-up data

The initial data-frame is a bit scrambled. This code chunk is cleaning up the data-frame and makes it more readable.

scat_df <- scat_df %>% 
  select(-...3)        # remove column containing only NA

#The date is imported as a number with class character
scat_df$...5 <- as.numeric(scat_df$...5) # column is changed to class numeric
scat_df$...5 <- excel_numeric_to_date(scat_df$...5) #using function from janiter, excel number is changed to data

#Change column names: makes is easier to work
scat_df <- scat_df %>% 
  rename(Question = `CoE Project 81 - 24h Surge Capacity Calculator`) %>% 
  rename(Answer = `...2`) %>% 
  rename(var4 = `...4`) %>% 
  rename(var5 = `...5`) 

# remove rows if columns 3-6 all contain NA
scat_df <- scat_df[rowSums(is.na(scat_df[,3:6]))!=4,]

# 1st part: move 'date: from column var4 to column 'Question'
# 2nd part: change date to character, then copy (if it starts with 4 digits) to column 'Question'
move_date <- function(x) { 
  x <- mutate(x, Question = 
                      ifelse(grepl('Date:', x$var4), "Date:", x$Question)
              )
  
  x$var5 <- as.character(x$var5)
  x <- mutate(x, Answer = 
                      ifelse(grepl('[[:digit:]]{4}', x$var5), x$var5, x$Answer)
              )
  
  
}
scat_df <- move_date(scat_df)

# Working with NA in ifelse, is a little bit different.
# Move date from column var to column Answer, if column Answer contains NA
scat_df$Answer <- ifelse(is.na (scat_df$Answer), 
                          scat_df$var4, 
                         scat_df$Answer)

# Drop last two columns that aren't used anymore

scat_df <- scat_df %>% 
  select(-var4, -var5)

# Delete rows with long text-strings 
scat_df <- scat_df %>% 
  filter(!grepl('^This part of the tool', Question)) 

scat_df <- scat_df %>% 
  filter(!grepl('^Before answering the', Question)) 

3.1 Overview of locations

This code-chunk extracts the City-name (or laboratory name) from the name of the file and adds it to the data-frame as a separate value. It is therefore important that the names of the Excel files in the folder are kept consistently.

scat_df$city <- sub(".*input/P81 BIOSEC_SCAT_*(.*?) *.xlsm*", "\\1", scat_df$value)

tbl_city <- unique(scat_df$city)
print(tbl_city)
## [1] "Bangkok"    "Chumpon"    "Udon_Thani"

3.2 Pivoting the tables

Step one. The problem is that in Excel, names of equipment are merged in the Questions: the same question would then be seen as different variables / columns. The first step is to make the questions uniform between the different SCAT Excel files.
For example the question: “How many LightCycler 480 PCR are used for molecular testing under normal operations?” need to be converted removing the part ‘LightCycler 480 PCR’.

scat_df <- scat_df %>% 
  group_by(value) %>% ## cut the dataset in chunks by Excel file-name (=value)
  mutate(ticker = row_number()) # add column with row number, which repeats for each 'file name'

scat_df <- ungroup(scat_df) ## undo grouping


## Rename First important equipment normal situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 13), "equipment_1_normal", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 14), "occupation_1_normal", scat_df$Question))

## Rename Second important equipment normal situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 15), "equipment_2_normal", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 16), "occupation_2_normal", scat_df$Question))

## Rename Third important equipment normal situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 17), "equipment_3_normal", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 18), "occupation_3_normal", scat_df$Question))



## Rename First important equipment emergency situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 25), "equipment_1_emergency", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 26), "occupation_1_emergency", scat_df$Question))

## Rename Second important equipment emergency situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 27), "equipment_2_emergency", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 28), "occupation_2_emergency", scat_df$Question))

## Rename Third important equipment emergency situation
scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 29), "equipment_3_emergency", scat_df$Question))

scat_df <- scat_df %>% 
  mutate(Question = 
           ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 30), "occupation_3_emergency", scat_df$Question))


scat_df <- scat_df %>% 
  select(-ticker)

Step two. The table is pivoted with one lab per row. The SCAT ‘Questions’ now become the Variable names. Most variable names (= Questions) become pretty long. Therefore, they need to be shortened. The new applied variable names are listed and described in the table in the Introduction.

# Pivoting the table
scat_df_wide <- scat_df %>% pivot_wider( 
  names_from = Question,
  values_from = Answer,
  values_fill = NA
  )

#Change column names
scat_df_wide <- scat_df_wide %>% 
  rename(file_name = value) %>% 
  rename(date = `Date:`) %>% 
  rename(user_guidelines = `User guidelines`) %>% 
  rename(equipment_1 = `1.`) %>% 
  rename(equipment_2 = `2.`) %>% 
  rename(equipment_3 = `3.`) %>% 
  rename(normal_situation = `Question regarding a normal situation`) %>%  
  rename(test_max_day = `What is the maximum number of molecular tests the laboratory can perform per day?`) %>% 
  rename(operating_hrs = `What are the operating hours of the laboratory?`) %>% 
  rename(work_hrs = `How many hours do staff work per day?`) %>% 
  rename(man_hrs_pre = `How many manhours per day does the pre-analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>% 
  rename(man_hrs_an = `How many manhours per day does the analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>% 
  rename(man_hrs_post = `How many manhours per day does the post-analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>% 
  rename(emergency_situation = `Question regarding an emergency situation`) %>% 
  rename(operating_hrs_max = `What are the maximum operating hours of the laboratory during emergency operations?`) %>% 
  rename(work_hrs_max = `How many hours per day can staff work during emergency operations?`) %>% 
  rename(man_hrs_pre_add = `How many additional manhours would be available to work on the pre-analytical phase under emergency operations?`) %>%
  rename(man_hrs_an_add = `How many additional manhours would be available to work on the analytical phase under emergency operations?`) %>%
  rename(man_hrs_post_add = `How many additional manhours would be available to work on the post-analytical phase under emergency operations?`) %>%
  rename(current_nr_tests = `Current maximum 24 hour surge capacity`) %>%
  rename(max_nr_tests_emergency = `The maximum number of molecular tests the laboratory can perform per day under emergency operations`) %>%
  rename(limiting_factor = `Factor limiting 24 hour surge capacity`)

4 Results

The data frame is exported back into an standard csv file into the folder output.

write_csv(scat_df_wide, "./output/scat_complete.csv")

The data-set can also be presented as an html interactive table. The buttons can be pressed to download the data in different formats. A subset is prepared for this table, otherwise it becomes too long.

scat_sub <- scat_df_wide %>% 
  select(file_name, city, test_max_day, max_nr_tests_emergency)

scat_sub$city <- as.factor(scat_sub$city)
scat_sub$test_max_day <- as.integer(scat_sub$test_max_day) # change variable type character to type integer
scat_sub$max_nr_tests_emergency <- as.integer(scat_sub$max_nr_tests_emergency)

scat_sub <- round_df(scat_sub, 0)


tbl_1 <- create_dt(scat_sub)

tbl_1
saveWidget(tbl_1, "./output/tbl_1.html") ## Export interactive table 1 to folder output

tests <- scat_sub %>% 
  pivot_longer(
    cols = test_max_day:max_nr_tests_emergency,
    names_to = "stage",
    values_to = "tests_nr"
  )


p1 <- plot_ly(data = tests, x = ~tests_nr , y = ~city, orientation = 'h', color = ~stage) %>% 
  add_trace(type = "bar") %>% 
  layout(
    title = "Number of tests per day in normal stage and during emergency.",
    xaxis = list(title = "Number of tests"),
    yaxis = list(title = "City name")
    ) 

p1
saveWidget(p1, "./output/plot1.html") ## Export interactive plot 1 to folder output

The table below lists an overview of the three most important equipment used for performing the molecular tests in each laboratory. The last column contains the factor limiting the 24 hour surge capacity.

equipment <- scat_df_wide %>%
  select(city, equipment_1, equipment_2, equipment_3, limiting_factor)

tbl2 <- head(equipment) %>%
  kable() %>%
  kable_styling(position = "left", full_width = FALSE) %>%
  column_spec(1, bold = TRUE, border_right = TRUE, color = "black", background = "lightgrey")

tbl2
city equipment_1 equipment_2 equipment_3 limiting_factor
Bangkok LightCycler 480 PCR BSL-2 safety cabinet MagNA pure RNA extraction Additional personnel Pre-Analytical phase or longer working days
Chumpon Biorad CFX Opus 96 Qiagen Easy2 NA extraction BSL2 Biosafety cabinet Biorad CFX Opus 96 occupation or Additional Biorad CFX Opus 96
Udon_Thani Magnapure NA extraction Biorad CFX Opus 96 BSL2 Biosafety cabinet Additional Magnapure NA extraction
save_kable(tbl2,
  "./output/tbl2.html",
  bs_theme = "simplex",
  self_contained = TRUE,
  extra_dependencies = NULL,
  latex_header_includes = NULL,
  keep_tex = FALSE,
  density = 300
)

5 FAQ

Possible error codes.

Line 97 Error in 'mutate()':  
! Problem while computing '25h Surge Calculator = map(value,

You may have opened one or more an Excel files.
Solution: close all Excel files before running code.